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Abstract 

The evolutionary history of certain species such as polyploids are modeled by a generalization of phylogenetic trees called 
multi-labeled phylogenetic trees, or MUL trees for short. One problem that relates to inferring a MUL tree is how to 
construct the smallest possible MUL tree that is consistent with a given set of rooted triplets, or SMRT problem for short. 
This problem is NP-hard. There is one algorithm for the SMRT problem which is exact and runs in 0(7") time, where n is the 
number of taxa. In this paper, we show that the SMRT does not seem to be an appropriate solution from the biological point 
of view. Indeed, we present a heuristic algorithm named MTRT for this problem and execute it on some real and simulated 
datasets. The results of MTRT show that triplets alone cannot provide enough information to infer the true MUL tree. So, it is 
inappropriate to infer a MUL tree using triplet information alone and considering the minimum number of duplications. 
Finally, we introduce some new problems which are more suitable from the biological point of view. 
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Introduction 

MUL trees are rooted phylogenetic trees where some leaves are 
labeled by the same taxa. They find applications in the study of the 
evolution of polyploids. The other applications of MUL trees 
include molecular systematics, biogeography, the study of host- 
parasite cospeciation and computer science [8,1 1,15, 18-20,22]. In 
this paper we focus on rooted binary MUL trees. Several 
algorithms for constructing MUL trees from various datasets are 
introduced. Examples include building consensus MUL trees 
[6,14,15], constructing a phylogenetic network from a MUL tree 
[10] and transforming a collection of MUL trees into a collection 
of evolutionary trees [23]. One of the problems in the field of 
inferring MUL trees is to construct a smallest possible MUL tree 
consistent with a given set of rooted triplets, or SMRT problem for 
short. It is proved that SMRT is an NP-hard problem [9] . Up to 
now, a number of algorithms for inferring a phylogenetic tree or 
network from a set of triplets are presented [1,4,12,13,24-26]. 
However, there is only one algorithm for constructing a smallest 
possible MUL tree from a set of triplets [9] . This algorithm is exact 
and runs in 0(1") time where n is the number of taxa. Here, we 
present the MTRT algorithm which is a heuristic method for the 
SMRT problem. MTRT is based on Aho et al.'s algorithm 
presented in [1]. Aho et al.'s algorithm is a top-down algorithm 
that constructs a rooted tree consistent with a given set of triplets, if 
such a tree exists. In the MTRT algorithm, we modify the Aho et 
al.'s algorithm to construct a MUL tree with the minimum 
number of duplications that is consistent with a given set of triplets. 
The duplication in a MUL tree is defined in the next section. We 
tested the performance of the MTRT algorithm on more than 400 
biological and simulated datasets and showed that MTRT is 



efficient and can often find the optimal answer in practice. 
Furthermore, we showed that minimizing the number of 
duplications may not be an appropriate criterion for inferring a 
MUL tree. 

Preliminaries 

A rooted triplet, or triplet for short, is a binary rooted tree on 
three distinct taxa. A triplet on three taxa i, j and k is denoted by 
ij\k if the lowest common ancestor of i and j is a proper 
descendant of that of i and k, or j and k. Let be a set of triplets 
on a taxa set L. For any subset L' of L, the set of all triplets 
ij\k e for which i, j, k e L' is called the set of triplets induced 
by L and is denoted by 5£|//. We also set K(Z,, L') : = 
{ab\c e $l\ L : either a, b e L or c e L'}. A triplet ij\k and a 
MUL tree M are said to be consistent if ij\k is an embedded 
subtree of M. We say that a MUL tree M and a given set of 
rooted triplets are consistent if every triplet in is consistent with 
M. The set 5R(M) of all triplets consistent with M is called the 
triplet encoding of M. The following definitions are taken from 
[9]: 

For any MUL tree M, denote the set of all leaf labels that occur 
in M by L(M). For any leaf label x e L(M), the number of 
duplications of x is equal to the number of occurrences of x in M 
minus 1. The number of leaf duplications in M, denoted by d(M), 
is the total number of duplications of all leaf labels in L(M). 
Define m(M) as the number of leaves in M. Then, 
d(M) = m(M) — \L(M)\. Now, we consider the following problem, 
called the smallest MUL tree from rooted triplets problem, or 
SMRT for short: 
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Figure 1. An original MUL tree used to test the MTRT algorithm. 

doi:10.1371/journal.pone.0103622.g001 



SMRT problem. Given a set of rooted triplets over a leaf 
label set L, output a MUL tree M with L(M) = L which is 
consistent with and minimizes d(M). 

Results 

Simulation data 

In this section, we report the results of our simulation study. For 
all data, the MTRT algorithm was run on a laptop with a 
1.8 GHz Dual Core processor and 1GB RAM. MTRT is 
implemented in MATLAB. To test the performance of the 
algorithm, we simulated 400 MUL trees by Mesquite program 
[16]. This program can simulate and analyze gene trees from 
multiple populations. Three components must be established in 
Mesquite to do this: 

1 . A block of taxa representing the gene sequences. 

2. A block of taxa representing the species (or populations). 

3. A taxa association block, which is a special block of information 
that indicates how the taxa representing genes are associated 
with the taxa representing species. 

Once these three components are established, Mesquite 
simulates gene trees by a coalescent process. The simulation starts 
at each extant population. Within each, the ancestry of the gene 
copies contained (as specified by the Taxa Association) is simulated 
by coalescence, going backward in time until the simulation arrives 
at the previous population (species) divergence. Mesquite makes 
this reconstruction under one assumption: that the only process 
occurring is gene duplication or extinction. Thus, the reconstruc- 
tion reconciles the gene tree into the population tree so as to 
minimize the depths of gene tree divergences, which also 
minimizes gene duplication or extinction events, see [16] for 
more details. 

Now we describe the procedure of simulating MUL trees. 
Suppose the gene tree GT produced by Mesquite has n taxa. We 
considered the number of taxa for the species tree 5*7" associated 
with GT between n/2 and n. Then, we randomly indicated how 



the taxa representing genes are associated with the taxa 
representing species to obtain a taxa association block. After the 
simulation of the gene tree, to obtain a MUL tree, we replaced 
each gene by the species that belong to it. In all simulations, we 
considered n between 5 and 50. For each simulated MUL tree, we 
extracted all its triplets and applied the MTRT algorithm on the 
triplet set. The results show that in 42 percent of the datasets, 
MTRT produces a MUL tree which has less number of 
duplications than that of the original MUL tree. In only 10 
percent of the datasets, the number of duplications for the output 
MUL tree of MTRT is greater than that of the original MUL tree. 
For the remaining 48 percent, the number of duplications for both 
MUL trees are the same. Hence, in 90 percent of the datasets, the 
algorithm MTRT constructs a MUL tree that has less or equal 
number of duplications than that of the original MUL tree. The 
minimum, maximum and average running times of the algorithm 
on 400 simulation datasets are 0.017, 40.36 and 9.1 seconds 
respectively. Figure 1 shows a simulated MUL tree. The output of 
the MTRT for the triplet set extracted from this MUL tree is given 
in Figure 2. The output MUL tree has one duplication while the 
original MUL tree has two duplications. We also compare MTRT 
with the exact algorithm presented in [9]. Since the exact 
algorithm requires exponential time and space, we can only run 
this algorithm on 100 small datasets which have 5—10 taxa. In 86 
datasets, the MUL trees produced by both MTRT and exact 
algorithm have the same duplications. This shows that MTRT in 
many cases produces the smallest MUL trees for the triplet sets. 
For further study, we analysed the results of the exact algorithm. 
We found that, in 56 datasets, the exact algorithm produces a 
MUL tree which has less number of duplications than that of the 
original MUL tree. 

Real data 

To test the performance of the MTRT on real biological 
datasets, we applied MTRT on three datasets. The first and 
second datasets containing high-polyploid North American and 
Hawaiian violets [17]. All major morphological groups occurring 
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Figure 2. The obtained MUL tree by applying MTRT on the triplets extracted from the MUL tree shown in Figure 1. 

doi:1 0.1 371 /journal.pone.01 03622.g002 



in North America were sampled. All sequence were aligned with 
MUSCLE [7] and phylogenies were constructed using maximum 
likelihood. The third dataset containing the flowering plant genus 
Silene (Caryophyllaceae) was published in [21]. The gene trees in 
[21] are reconstructed using standard techniques in phylogenetic 
analysis from regions of the nuclear RNA polymerase gene family, 
two concatenated chloroplast regions and one nuclear ribosomal 
region, see [10] for more details. For each original MUL tree, we 
extracted all triplets and then apply MTRT on these triplets. In all 
cases, MTRT constructs a MUL tree which has less number of 
duplications than that of the original MUL tree. The original 
MUL trees for first and second datasets have 13 and 20 
duplications, whereas the MUL trees produced by MTRT have 
11 and 18 duplications respectively. Due to limitations of space, 
the MUL trees associated with one of the data are shown. Figure 3 
and Figure 4 show the original MUL tree and the MUL tree 
constructed by MTRT for the triplet set extracted from the 
original MUL tree respectively. The original MUL tree for third 
dataset has 7 duplications, whereas the MUL tree produced by 
MTRT has 5 duplications. Figure 5 and Figure 6 show the 
original MUL tree and the MUL tree constructed by MTRT 
respectively. The labels represent Silene species, namely, S. 
ajanensis (A), S. uralensis (U), S. involucrata (I), S. sorensenis (S), 
S. ostenfeldii (O), S. zawadskii (Z), S. linnaeana (L), S. uralensis 
(Mongolia) (UM), S. samojedora (SAM), S. villosula (V), S. 
sachalinensis (SAC) and S. tolmatchevii (T). 

Reconstruction accuracy 

For a phylogeny reconstruction algorithm, if a certain tree or 
network is used to obtain the input data, the algorithm should 
return exactly this tree or network. This is an important property 
for reconstructing phylogenies and known as the consistency 
principle. In the previous section, we observed that, for half of the 
simulated datasets and two real datasets, the number of 
duplications for input and output MUL trees are different. 
Further investigation showed that although some output MUL 
trees differ from input MUL trees, the outputs are consistent with 



all triplets corresponding to input MUL trees. In addition, we 
observed that some output MUL trees have more triplets than the 
corresponding input MUL trees. These observations show that 
inferring a MUL tree by minimizing the number of duplications 
may not properly detect biological properties and evolutionary 
relationships. So, there is a deficiency in the SMRT problem from 
a biological point of view. For further analysis, we used a concept 
which has already been defined for a tree called the rooted triplet 
distance to compare the output MUL trees with the input MUL 
trees [5]. 

Definition 1. The rooted triplet distance between two rooted 
phylogenetic trees T\ and T-j 011 taxa set X is defined as 



TD(T U T 2 ) 



1 



(2i)AR(72)|, 



where A is the symmetric difference between two sets. For 
example, for the two MUL trees M\ and M\ shown in Figure 7a 
and Figure 7b respectively, M\ is consistent with all triplets in M\ 
and has less duplication than M\. Since M\ satisfies an extra 
triplet 23 1 1 which is not contained in 5J(Mi), so 
TD(M\ , Mi ) = 0.5. It shows that it is possible to present an 
algorithm satisfying all conditions of SMRT problem but does not 
return the correct MUL tree, that is, it does not satisfy the 
consistency principle of phylogeny reconstruction algorithms. 
Now, consider another two examples: MUL trees M2 and Mi 
shown in Figure 7c and Figure 7d respectively. These MUL trees 
have the same number of duplications and ^t(Mi) = $l(Mi ), that 
is, TD(Mi, Mi ) = 0. But these are different MUL trees because 
they have different duplication leaves and have different clusters. 
This situation happened because in a MUL tree, a triplet may 
occur several times. For example, the triplet 12|3 occurred three 
times in the MUL tree shown in Figure 8. This phenomenon 
exactly occurred in Figure 7c and Figure 7d. For instance, the 
triplet 12|4 occurred in Mi once whereas it occurred twice in Mi . 
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Figure 3. An original MUL tree on violet species with 20 duplications. 

doi:1 0.1 371 /journal. pone.0103622.g003 
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Figure 4. The obtained MUL tree by applying MTRT on the triplets extracted from the MUL tree shown in Figure 3. This MUL tree has 
18 duplications. 

doi:10.1371/journal.pone.0103622.g004 
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Figure 5. An original MUL tree on flowering plants with 7 duplications. 
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Hence, the rooted triplet distance introduced in Def. 1 does not 
properly show the distance between two MUL trees. 

A multiset is defined as a 2-tuple ( Y, rri) where Y is some set 
and m is a function from Y to the positive natural numbers N. 
The set Y is called the underlying set of elements. For each y 6 Y , 
the multiplicity m(y) is denoted to be the number of occurrences of 
y. The symmetric difference between two multisets (Y\,m\) and 
(Y 2 ,m 2 ) is denoted by ( Y\, mi)A„,( Y 2 , m 2 ) : = {x e Yi (J Y 2 : 
m(x) #0}, where 



m(x) - 



\m\{x)- 

m x (x) 

m 2 (x) 



-m 2 (x)\ x e 
x e T, ■ 
x e Yi - 



Yir\Y 2 

Y 2 
Y\ 



We also define 
\(Y,m)\ := E m(y). 

yeY 

{1, 1, 1, 2, 3, 3, 4} and {1, 1,2, 2, 



the size of a multiset (Y, in) as 
For example, consider two multisets 



2, 3, 3, 5, 5}. The symmetric 
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Figure 6. The obtained MUL tree by applying MTRT on the triplets extracted from the MUL tree shown in Figure 5. This MUL tree has 5 
duplications. 

doi:1 0.1 371 /journal.pone.01 03622.g006 
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Figure 7. Comparing MUL trees using triplet distance, (a) The MUL tree Mi, (b) The MUL tree M\ is consistent with Sft(Afi). The MUL tree M\ 
has less duplication than Mi and is consistent with the triplet 23|1 which is not contained in Jf(Mi). So, TD(M\ , M[ ) = 0.5, (c) The MUL tree M 2 , (d) 
The MUL tree M 2 is consistent with 5R(A/ 2 ). The MUL trees Mi and M 2 have the same number of duplications and TD(M 2 , M 2 ) = 0. 
doi:10.1371/journal.pone.0103622.g007 



difference between these sets and its size are {1, 2, 2, 4, 5, 5} and 
6 respectively. For a MUL tree M, let $t'(M) = (3i(M), m ) be the 
triplet encoding multiset of M. It means that if a triplet is seen in 
the MUL tree k times, then $i'(M) contains this triplet k times. 
We define the new triplet distance between two MUL trees as 
follows: 

Definition 2. 

(a) The rooted triplet distance between two rooted phylogenetic 
MUL trees M\ and M2 on taxa set X is defined as 



TD M (M', K) = |»'(M')A m K|. 

(c) The rooted triplet distance between two multisets of triplets 
5Jl and on taxa set X is defined as 



TD M (M U M 2 ) = W(M l )A,„W(M 2 )\. 

(b) The rooted triplet distance between a rooted phylogenetic 
MUL tree M' and a multiset of triplets on taxa set X is 
defined as 




Figure 8. A MUL tree which has three different triplets 12 3. 

doi:1 0.1 371 /journal.pone.01 03622.g008 



roj,(»i,»2) = |»iA m H 2 |. 

Using the new rooted triplet distance TDmO defined in Def. 2, 
the distance between MUL trees M2 and M2 shown in Figure 7 
equals TDm{Mi, Mj ) = 56. Note that a MUL tree is not uniquely 
defined by its multiset of triplets. For example, two MUL trees 
shown in Figure 9 have the same multiset of triplets. However, it 
seems that for most of the MUL trees specially for large MUL 
trees, it is true that two MUL trees are isomorphic if they have new 
triplet distance TDmQ equal to 0. To show this, we computed the 
triplet distance TDQ and new triplet distance TDmQ for all 
simulated and real datasets. The results of simulated datasets are 
shown in Table 1. Suppose M- ln is a MUL tree and M out is the 
result of applying MTRT algorithm on 5J(M,„). We define 
RD(M in , M out ) : =d(M 0U ,) — d(M in ). We classify the simulated 
datasets into 5 classes: 

• A : = {datasets : TD(M in , M out ) = 0}, 



• B : = {datasets : RD(M in , M out )<0}, 
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• C : = {datasets : RD(M in , M 0M ) = 0}, 



• D : = {datasets : RD(M in , M out )>0}, 



• E : = {datasets : TD M (M in , M ou ,) = 0}. 

Table 1 shows the intersection of above sets. For example, in 
100 datasets, MTRT produces a MUL tree which has less 
duplication than that of the input MUL tree and the correspond- 
ing triplet distance is 0. In 74 datasets, the output and input MUL 
trees have the same number of duplications and the new distance 
between them is 0. We studied these 74 datasets and found that 
their corresponding output and input MUL trees are exactly the 
same. We also examined the exact algorithm on 100 datasets 
mentioned in Results section. The results show that in 56 datasets, 
the exact algorithm produces MUL trees which have less number 
of duplications than that of the original MUL tree. For the 
remaining datasets, the number of duplications for both MUL 
trees are the same. This shows that for more than fifty percent of 
the cases, the MUL tree produced by the exact algorithm is 
different from the input MUL tree. We also obtained the TDQ and 
TDmQ for real datasets. For the first real data, 7750 l& 98, that is, 
the output MUL tree has 196 triplets which are not contained in 
input triplet set. TDm() for this data is 2573. For second real data, 
TDQ is 76.5, that is, the output MUL tree has 153 triplets which 
are not contained in input triplet set. TDmQ for this data is 6151. 
For third data, TDQ and TDmQ are 2 and 255 respectively. These 
numbers and Table 1 show that in many cases the SMRT 
problem and its conditions do not satisfy the consistency principle. 
Hence in many cases, the algorithms based on SMRT fail to 
produce the exact MUL tree. 

Discussion and Future Works 

In this paper, we presented a heuristic algorithm MTRT for the 
SMRT problem. MTRT is implemented in MATLAB and is 
available at http://bs.ipm.ir/softwares/MTRT/. The goal of the 
algorithm is to construct a minimal MUL tree that is consistent 
with the input set of triplets and minimizes the number of its 
duplications. Note that a phylogenetic network can be associated 
to a MUL tree [14]. Therefore, it seems that constructing the 
smallest MUL tree from a set of triplets could be an alternative 
method for the problem of constructing a phylogenetic network 
with minimum reticulation from a set of triplets. To test the 
performance of the MTRT, we applied it on 400 simulated MUL 
trees and three real datasets. For each simulated and real MUL 
tree, we extracted all its triplets and applied the MTRT algorithm 




1 22 3 1231 



Figure 9. Two different MUL trees with tha same multiset of 

triplets {12|3, 23|1}. 

doi:10.1371/journal.pone.0103622.g009 



on the triplet set. We have shown that in most cases, the MTRT 
works well and has an acceptable running time. In only 10 percent 
of the datasets, the number of duplications for the output MUL 
tree of MTRT is greater than that of the original MUL tree. We 
also compared MTRT with exact algorithm. To do this, we 
executed the exact algorithm on 100 datasets. We showed that, in 
86 datasets, the MUL trees produced by both MTRT and exact 
algorithm have the same duplications. We found that for more 
than 50 percent of the cases, the exact algorithm produces an 
output which is different from the input. It shows that the SMRT 
problem does not satisfy the consistency principle. So, having the 
set of triplets consistent to a MUL tree is not enough to infer that 
MUL tree. Furthermore, considering the minimum number of 
duplications to reconstruct a MUL tree that is consistent with a 
given set of triplets is not appropriate to infer the correct MUL 
tree. Therefore, from a biological point of view, there is a 
deficiency in the SMRT problem. Equivalendy, the problem of 
constructing a phylogenetic network with minimum reticulation 
from a set of triplets is not consistent with the consistency principle 
of phylogeny reconstruction methods. It is necessary to consider 
other conditions to obtain proper MUL trees or phylogenetic 
networks. We extended the definition of triplet distance TDQ and 
introduced a new triplet distance TDmQ- For all datasets, we 
compared the output MUL tree with original MUL tree by 
TDmQ- For all datasets with TDmQ = 0, we showed that the 
output and original MUL trees are the same. According to these 
observations, we propose the following problem, called MUL tree 
from a multiset of rooted triplets with minimum triplet distance, or 
mMTd for short: 

mMTd problem. Given a multiset 5t of rooted triplets over a 
leaf label set L, output a MUL tree M which minimizes 
TD M (M, 5R). 

Note that the maximum rooted triplets consistency problem, or 
MRTC for short [4], is a special case of mMTd problem. A 
natural question is how a multiset can be generated from 
biological data? For example, in the study of area cladograms, 
suppose a set of triplets is produced and we are interested to 
replace organisms by area names. Or in the other field, suppose we 
want to replace parasites by their host. Thus, a multiset of triplets 
may be derived from a great variety of biological processes. 

We can simply extend the definition of the new triplet distance 
to a phylogenetic network. Hence, the other problem can be 
defined as follows, called Network from a multiset of rooted triplets 
with minimum triplet distance, or nMTd for short: 

nMTd problem. Given a multiset 5ft of rooted triplets over a 
leaf label set L, output a network N which minimizes TDm(N,^R). 

Materials and Methods 

This section describes a heuristic method MTRT that aims to 
solve the SMRT problem. We first define the concept of a 
separating set in a graph. Consider a graph G = (V, E). The 
subgraph G[U] induced by U <= V has a vertex set U and an 
induced edge set E\ v that consists of all edges in G whose both 
endpoints lie in U. Suppose G is a connected graph. The set S c V 
is called a separator, or a separating set, of G if GfF^S] is 
disconnected. Now, let 5ft denotes a given set of triplets over a leaf 
label set L. MTRT tries to build a MUL tree M which is 
consistent with 5ft and its leaf duplications d(M) is as small as 
possible. MTRT is based on Aho et al.'s algorithm [1]. The 
Auxiliary graph, denoted by AG(?R), is required, which is a graph 
corresponding to 5ft with vertex set L and edge set E such that: 
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Table 1. The results of MTRT algorithm on simulated datasets. 
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E :={datasets : TD M (M in , M oul ) = 0} where M ln is a MUL tree and M tml is the result of applying MTRT algorithm on 5R(M,„). 
doi:1 0.1 371/joumal.pone.OI 03622.t001 



Vo, b e L : e = {a, b} e _£<s»3c e L s.t ab\c e!R. 

In general, the algorithm MTRT does the following steps. 
AG(X) is computed first. If AGQR) is disconnected, then the set L 
is partitioned into two non-empty sets A and B such that the set of 
vertices in each connected component of AGQR) is a subset of 
either A or B. Now, the triplet sets SR^ and 5Rg are computed. We 
set : = U\ A and U B ■ = If AG(?H) is connected, then MTRT 
tries to find the minimum separating set S and classifies the 
connected components of AG[L\S] into two non-empty sets A' 
and B'. It is well known that finding the all minimum-size 
separators is an NP-hard problem [3]. To find a minimum 
separator, we use AllMinSep algorithm [2] . AUMinSep computes 
the set of all minimal separators of a graph G in time O(« 3 |0|) 
where \8\ is the number of all minimal separators. AllMinSep first 
produces an initial set of minimal separators 6. Then for each 
f e 9, a family of other minimal separators is generated and added 
to 6. This procedure is done until all minimal separators are 
obtained, see [2] for more details. Since the number of all minimal 
separators can be exponential and we do not need all the minimal 
separators, so we use the AllMinSep with a small change to make it 
a greedy algorithm. Suppose the initial set of minimal separators 8 



2 




has been obtained and m is the size of the smallest separator in 8. 
Then for each <p e 6, a family of other minimal separators 8' is 
generated. Now, the separator q>' e 8' is added to 8 if \ <m. 

Let S be a separator computed by AllMinSep and the 
connected components of AG[L\S] are classified in two non- 
empty sets A' and B'. We set A=A'\JS and B = B'\JS. The 
triplet sets corresponding to A and B are considered as follows: 

^ A : = U\ A - ~R{A, S) and $t B - = U\ B - ?ft(B, S). 

Now, the algorithm recursively handles sets A and B with triplet 
sets and SJs respectively. Let the MUL trees constructed by 
MTRT for the sets A and B are M A and Mg respectively. We 
report the MUL tree MT^ g^ formed by connect M A and Mb 
with the same root. For the case that AG($t) is connected, we 
define and 5Jg in such a way because the members of S are 
repeated on both sides of the root. So, the set 
{ab\c e : either a, b e S or c e S} is consistent with the 
MT{ A gj and it is unnecessary to consider this set. It is obvious 
that the output MUL tree of the algorithm is consistent with SR. 
We now illustrate the steps of the algorithm MTRT by an 
example. 



2 




M B 

(b) (c) 




Figure 10. Steps of MTRT. (a) The auxiliary graph corresponding to » = {12|3, 13|4, 23|1, 34|1, 35|2, 45[1, 45|3}, (b) MT {A B} , (c) The auxiliary graph 

AG($t A ), (d) The auxiliary graph AG(lSt B ), (e) A smallest MUL tree produced by MTRT algorithm. 

doi:10.1371/journal.pone.0103622.g010 



PLOS ONE | www.plosone.org 



8 



July 2014 | Volume 9 | Issue 7 | e103622 



Using Triplets to Infer the Multi-Labeled Trees 



MTRT(5R, L) 



Input: A leaf set L and a set of triplets 9? on L 

Output: A small MUL tree MT consistent with SR 



1: If |L| > 2 

2: Build AG0R). 

3: If AG(9i) is connected 

4: Compute the separator S with minimum a s for AG (3?). 

5: Compute 4', B'. 

6: Set A ■= A'U S and B •■= B'U S. 

7: Set 3*4 == 9?U - 9?(A5) and 5R B := 5R| B - 5R(B, S). 

8: Else 

9: Compute A, B. 

10: Set yi A ■■= yi\ A and $R B := 5?| B . 

11: End if. 

12: M A : = MTRT(ft A ,L\ A ). 

13: M B :=Mrffr(5R B ,L| B ). 

14: Build MT^ A B} by considering a root p and connecting p to M A and M B . 

15: Else If |L| = 2 

16: In this case, MT is a tree with two leaves. 

17: Else 

18: In this case, \L\ = 1 and MT is a tree with one node. 

19: End if. 

End MTRTfJR, L) and Return MT. 



Figure 11. Pseudocode of the MTRT algorithm. 

doi:10.1371/journal.pone.0103622.g011 

Let £ = {1,2,3,4,5} and S = {12|3, 13|4, 23|1, 34|1, 35|2, 
34|5, 45 1 1 , 45|2} be the set of triplets over L. The auxiliary graph 
corresponding to 5t is shown in Figure 10a. The set 5= {3} is the 
minimum separator of AG($t). Hence, A = {1,2, 3} and 
B={3, 4, 5}. MT^ A b] is shown in Figure 10b. The induced 
triplet sets for A and B are »U = {12|3, 23|1} and 5t| B = {34|5} 
respectively. Now, R(A, S) = {12|3} is removed from $l\ A to 
obtain 5t^. So, 5t,4 = {23|l} and 5t B = {34J5}. The auxiliary 
graphs AG(^Ra) an d AGffis) ar e shown in Figure 10c and 
Figure lOd respectively. Finally, the MUL tree produced by 
MTRT algorithm is shown in Figure lOe. 

We now describe two cases that may occur in some steps of the 
algorithm: 

Case 1. It is possible at some steps of the algorithm, for a leaf 
label set [/, Sir/ = 0. In this case, the triplets of an arbitrary tree on 
U is considered as 5ft c/. For instance, let 
5R = {12|3, 13|5, 23|4, 34|1, 35|2, 34|5, 45|1, 45|2}. The separator 
of AG(Vt) is S = {3}. So, ^ = {1,2,3}, £ = {3,4,5}, 
5t| /1 = {12|3} and 5tj B = {34|5} and consequently, 5ft^ = 0 and 
Kg = {34|5}. Now, an arbitrary triplet set consistent with a tree on 
leaf label set A is considered as 5ft^, for example : ={23|1}. If 
the algorithm runs to the end, the MUL tree shown in Figure 1 Oe 
is produced. 
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